Fast forward to the bottom of this document for the map.
The model uses the EcoDes-DK15 dataset to predict forest quality. Annotations of forest quality were provided by polygons from various Danish agencies:
| forest quality | annotation source |
|---|---|
| high | §15 forest polygons |
| high | §25 forest polygons |
| high | private old growth |
| low | ikke §25 forests polygons |
| low | NST plantation polygons |
The polygons were bagged together in groups by forest quality To provide a balanced training dataset we randomly subsampled the low quality polygons down to n = 10k. The result was a training dataset of ~20k polygons. For each polygon we extracted zonal statistics (mean and sd weighted by cell area) for all EcoDes-DK15 variables.
We split the polygon data into two: 80% for testing and 20% for training. Models were then trained with 10 fold cross validation on the 80% trraining data.
We trained random forest and boosted regression tree models. The resulting models performed similarly. Hyperparameter tuning had little influence on the performance in the validation based on the test data set.
The overall model performance is okay ~76% accuracy, slightly worse than the by-pixel models. See details below:
## Confusion Matrix and Statistics
##
## Reference
## Prediction high low
## high 1412 465
## low 419 1507
##
## Accuracy : 0.7676
## 95% CI : (0.7538, 0.7809)
## No Information Rate : 0.5185
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.5349
##
## Mcnemar's Test P-Value : 0.1301
##
## Sensitivity : 0.7712
## Specificity : 0.7642
## Pos Pred Value : 0.7523
## Neg Pred Value : 0.7825
## Prevalence : 0.4815
## Detection Rate : 0.3713
## Detection Prevalence : 0.4936
## Balanced Accuracy : 0.7677
##
## 'Positive' Class : high
##
Here are the variables that make up the 20 most important predictors:
## gbm variable importance
##
## only 20 most important variables shown (out of 78)
##
## Overall
## normalized_z_sd_mean 100.000
## dtm_10m_mean 30.645
## normalized_z_sd_sd 25.606
## canopy_openness_sd 24.185
## openness_mean_mean 21.914
## twi_mean 20.370
## amplitude_sd_sd 15.871
## solar_radiation_mean 15.589
## aspect_sd 15.560
## vegetation_proportion_20m.25m_sd 15.077
## aspect_mean 14.580
## vegetation_proportion_25m.50m_sd 14.434
## twi_sd 13.240
## heat_load_index_sd 11.710
## amplitude_mean_sd 10.861
## amplitude_mean_mean 10.379
## vegetation_proportion_20m.25m_mean 10.377
## vegetation_proportion_19m.20m_sd 10.223
## vegetation_proportion_19m.20m_mean 9.346
## vegetation_proportion_03m.04m_mean 9.260
Here is a projection of the model results for the Aarhus region:
Note: The training data shown is the subset of training polygons within the Aarhus region. The model was trained on a nationwide training data set.